Multiple imputation of missing values in the wave 2007 of the IAB Establishment Panel

نویسنده

  • Jörg Drechsler
چکیده

The basic concept of multiple imputation is straightforward and easy to understand, but the application to real data imposes many implementation problems. To define useful imputation models for a dataset that consists of categorical and of continuous variables with distributions that are anything but normal, contains skip patterns and all sorts of logical constraints is a challenging task. In this paper, we review different approaches to handle these problems and illustrate their successful implementation for a complex imputation project at the German Institute for Employment Research (IAB): The imputation of missing values in one wave of the IAB Establishment Panel. Zusammenfassung Die Grundidee der multiplen Imputation ist einfach zu verstehen, aber die Anwendung des Verfahrens auf reale Datensätze stellt den Anwender vor etliche zusätzliche Herausforderungen. Viele Datensätze bestehen sowohl aus kategorialen als auch aus kontinuierlichen Variablen, wobei letztere alles andere als normalverteilt gelten können. Zusätzlich verkomplizieren Filterfragen und verschiedene logische Restriktionen die Modellbildung. In diesem Papier stellen wir verschiedene Möglichkeiten vor, mit diesen Herausforderungen umzugehen und veranschaulichen eine erfolgreiche Implementierung anhand eines komplexen Imputationsprojekts am Institut für Arbeitsmarktund Berufsforschung (IAB): Die Imputation der fehlenden Werte einer Welle des IAB Betriebspanels. JEL classification:C52, C81, C83

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disclosure Control in Business Data - Experiences with Multiply Imputed Synthetic Datasets for the German IAB Establishment Survey

Generating synthetic datasets based on the ideas of multiple imputation is an innovative method for statistical disclosure control. The basic idea is to replace the values for some confidential variables X with several draws from the posterior predictive distribution of X given some non confidential variables Y. Since the synthetic values are based on models for the joint distribution of the da...

متن کامل

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

Editing and multiply imputing German establishment panel data to estimate stochastic production frontier models

This paper illustrates the effects of item-nonresponse in surveys on the results of multivariate statistical analysis when estimation of productivity is the task. To multiply impute the missing data a data augmentation algorithm based on a normal/Wishart model is applied. Data of the German IAB Establishment Panel from waves 2000 and 2001 are used to estimate the establishment’s productivity. T...

متن کامل

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010